Building A Lexical Domain Map From Text Corpora

نویسنده

  • Tomek Strzalkowski
چکیده

SUMMARY In information retrieval the task is to extract from the database ~dl ,and only the documents which are relevant to a user query, even when the query and the documents use little common vocabul~u'y. In this paper we discuss the problem of automatic generation of lexical relations between words ,and phrltses from large text corpora :rod their application to automatic query expansion ill information retrieval. Reported here ,are some preliminary resuhs and observations from the experiments with a 85 million word Wall Street Journal dalabase and a 45 million word San Jose Mercury News database (piu'ts of 0.5 billion word TIPSTER/TRECdatabàse).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Published vs. Postgraduate Writing in Applied Linguistics: The Case of Lexical Bundles

Abstract: Lexical bundles, as building blocks of coherent discourse, have been the subject of much research in the last two decades. While many of such studies have been mainly concerned with  exploring  variations  in  the  use  of  these  word  sequences  across  different  registers  and disciplines, very few have addressed the use of some particular groups of lexical bundles within some gen...

متن کامل

Building domain specific lexical hierarchies from corpora

In this article, we present a new algorithm for building domain specific lexical hierarchies from texts. The basic elements of such a hierarchy are the normalized terms – mono and multi-word terms – extracted from a large corpus by a terminological extractor. The algorithm relies on collocations for representing the meaning of these terms, finding hierarchical relations between them and finally...

متن کامل

User-Centered Analysis of Corpora Using Semantic Features Redundancy

Accessing textual information is still a complex activity when users have to browse through large corpora or long texts. In order to help users in such tasks, we propose a model dedicated to lexical representation of thematic domains as well as tools for personal corpora analysis. The lexical model is a differential one, inspired by Saussure's semiotics. It consists in structuring and describin...

متن کامل

Developing Domain-Specific Gesture Recognizers for Smart Diagram Environments

Computer understanding of visual languages in pen-based environments requires a combination of lexical analysis in which the basic tokens are recognized from hand-drawn gestures and syntax analysis in which the structure is recognized. Typically, lexical analysis relies on statistical methods while syntax analysis utilizes grammars. The two stages are not independent: contextual information pro...

متن کامل

Mining Social Deliberation in Online Communication - If You Were Me and I Were You

Social deliberative skills are collaborative life-skills. These skills are crucial for communicating in any collaborative processes where participants have heterogeneous opinions and perspectives driven by different assumptions, beliefs, and goals. In this paper, we describe models using lexical, discourse, and gender demographic features to identify whether or not participants demonstrate soci...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994